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music recognition (OMR). OMR is a task in computer vision that explores the 
algorithms and models to recognize musical notation. This study proposed the 
stacking ensemble learning model to complete the OMR task using the 
common western musical notation (CWMN) musical notation. The ensemble 
learning model used four deep convolutional neural networks (DCNNs) 
models, namely ResNeXt50, Inception-V3, RegNetY-400MF, and 
EfficientNet-V2-S as the base classifier. This study also analysed the most 
appropriate technique to be used as the ensemble learning model’s 
meta-classifier. Therefore, several machine learning techniques are 
determined to be evaluated, namely support vector machine (SVM), logistic 
regression (LR), random forest (RF), K-nearest neighbor (KNN), decision tree 
(DT), and Naive Bayes (NB). Six publicly available OMR datasets are 
combined, down sampled, and used to test the proposed model. The dataset 
consists of the HOMUS_V2, Rebelol, Rebelo2, Fornes, OpenOMR, and 


PrintedMusicSymbols datasets. The proposed ensemble learning model 
managed to outperform the model built in the previous study and succeeded 
in achieving outstanding accuracy and Fl-scores with the best value of 
97.51% and 97.52%, respectively; both of which were achieved by the LR 
meta-classifier. 
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1. INTRODUCTION 

Music is often described as structured notes in time and musical notation is a music representation 
that visually communicates that definition of music [1]. Music is an art of human culture that is passed down 
from generation to generation. Generational changes also make changes and developments in the music culture 
itself. The development of this musical culture eventually brought it to the stage it is today, where musical 
notation can be described using a very common notation, namely the common western music notation 
(CWMN). This music notation has become an international representation and at the same time the most 
common in representing music in writing. This musical notation eventually became a problem in the field of 
computer vision which had the basic idea of making computers able to recognize musical symbols in this 
musical notation, just like humans. This idea finally led researchers to various problems that must be solved, 
so that the computer can recognize and detect musical symbols well. This problem is known as optical music 
recognition (OMR). OMR is a field of study that studies and develops computer algorithms and models to 
recognize musical notation in a document [2], with CWMN as the most common musical notation used in the 
study. CWMN is composed of musical symbols representing music. Figure 1 shows several symbols used in 
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CWMN to represent music elements. Other than those symbols, there are also bar lines, staff lines, trills, and 
other CWMN symbols. 
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Figure 1. Several CWMN symbols 


OMR has several benefits in everyday life. Music notation can be converted into several forms, such 
as musical instrument digital interface (MIDI) for playing the music and MusicXML for storing music notation 
in the form of sheet music documents. The benefits contributed by this OMR research greatly support musicians 
in practicing, exploring musical plays, and writing songs in CWMN musical notation. All the benefits provided 
by this OMR study are highly dependent on the model’s performance in detecting and recognizing symbols in 
a musical score, which is determined by the method and the results of the classification of musical symbols. 
Therefore, it is imperative to determine the appropriate and accurate classification method to be used in OMR 
experiments. Determining a good classification method for OMR eventually becomes a problem in the field of 
study. 

Researchers have conducted many studies and proposed various models capable of detecting and 
recognizing CWMN symbols. Mejia et al. [3] experimented with classifying music sheet images using several 
baseline architecture convolutional neural networks, namely VGG16, MobileNet, ResNet50, Inception V3, and 
Inception-ResNet-V2. The experiment shows good performances of the five models, with MobileNet achieving 
the highest accuracy. Other studies using datasets containing images of music scores or staves pieces have been 
carried out using several models or other techniques, such as a deep convolutional neural network (DCNN) 
using the darknet53 basic network on YOLO [4], deep watershed detector (DWD) [5], parallel bat algorithm 
[6], U-Net [7], [8], and several variations of the model using convolutional recurrent neural network (CRNN) 
[9]-[13]. These models have shown good results in carrying out the OMR task. In addition to these studies, 
some studies perform classification tasks at the symbol level of CWMN. These studies are conducted using 
datasets that contain images of cropped musical notation symbols, where each image will only contain one 
symbol. Some studies used several variations of the feature extractor followed by the K-nearest neighbor 
(KNN) classifier [14], [15]. There is also a study that applies a texture-based feature descriptor (daisy 
descriptor) that is optimized using quantum concept inspired gray wolf optimization and is continued by 
comparing the performance of several classifiers, namely multi-layer perceptron (MLP), KNN, Naive Bayes 
(NB), random forest (RF) and sequential minimal optimization (SMO) [16]. Although these studies have 
produced good and very good results in some of the studies, these results can still be improved. 

In performing classification tasks, the ensemble learning method has been widely used in other areas 
of classification problems and has produced excellent performance results [17]—[23], but as far as is known at 
the time of writing this study, there is only one study that applied this method to OMR [24]. Ensemble 
classification is a learning method with a mining approach that utilizes various classifiers that distinguish class 
labels for unlabeled things from accumulation [25]. The main idea of creating learning ensembles is to improve 
prediction performance by constructing multiple models or multiple predictions [26]. Paul et al. [24], proposed 
an OMR ensemble learning using three pre-trained deep learning models, namely ResNet50, DenseNet161, 
and GoogLeNet as the base-classifier models. The base-classifier segment of the model is then followed by a 
support vector machine (SVM) meta-classifier. However, in this study, several DCNN models which are newer 
and superior to the three models were selected to be designed as ensemble models. Those models are 
ResNeXt50, Inception-V3, RegNetY-400MF, and EfficientNet-V2-S. 

As the improved model of ResNet, ResNeXt can outperform ResNet in experiments [27], [28] and 
can perform classification well [29], [30]. It has only a few hyperparameters to set and its cardinality can 
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improve classification accuracy. Inception-V3 was introduced in [31] and proven to be able to do classification 
tasks very well [32]-[34]. RegNet [35] and EfficientNet-V2 [36] can be said to be still fairly new models since 
they were first introduced in 2020 and 2021 respectively. Both models have been used several times in the 
study of classification and have also shown that both models can perform well on the given task [37]-[42]. 
Therefore, based on those studies, these four models are chosen in this research to be contained as the ensemble 
base-classifier models for the OMR ensemble model. 

In addition, because there are no other studies that carry out ensembles on the OMR task, it is 
necessary to experiment and analyze the meta-classifier method that can perform well in the OMR ensemble 
model, which has not been done so far. Several machine learning techniques are chosen and analysed in this 
study, namely, SVM, logistic regression (LR), RF, KNN, decision tree (DT), and NB. There are also six 
musical notation symbols datasets used in this study, namely the handwritten online music symbols (HOMUS) 
version 2 [43], Rebelol [44], Rebelo2 [44], Fornes [45], OpenOMR [46], and PrintedMusicSymbols [47] 
datasets. These datasets are a collection of OMR datasets which do not contain music score images, but cropped 
CWMN symbols, so that these symbols are no longer placed on staff lines. These datasets are then combined, 
hence producing a unified dataset. Using the proposed model and the determined datasets, this study will only 
focus on boosting the performance result produced by the model without considering other assessment 
variables, such as time and resources required. All experiments built were run using the Python programming 
language on Google Colab Pro. The Python programming language version used is Python 3.8.15. The 
experiment was run using the available GPU on Google Colab Pro, which is one among the NVDIA Tesla 
K80, P100, and T4 randomly selected by the server. The results of the performance of each DCNN and the 
results of the ensemble of each meta-classifier in each dataset have been presented in this study. 


2. METHOD 
2.1. Overview 

In this study, an ensemble learning design has been proposed to perform the OMR task. The designed 
learning ensemble consists of four DCNN architectures in the base-classifier segment and one machine learning 
classifier in the meta-classifier segment. Several DCNN architectures that are used as the base-classifier models 
are ResNeXt50, Inception-V3, RegNetY-400MF, and EfficientNet-V2-S. These models in the base-classifier 
segment will carry out the classification task independently. This classification process produces a collection 
of predictions produced by the four models of the base-classifier segment. This collection of predictions are 
given to the meta-classifier as the input data. The meta-classifier will once again perform the classification 
process and produce final prediction for each data in the dataset. Hence, ensemble prediction is produced. In 
this meta-classifier segment, a machine learning technique will be used as the meta-classifier of the proposed 
ensemble learning method. To determine the best machine learning technique to be applied to the ensemble 
learning model for the OMR task, several machine learning techniques, namely SVM, LR, RF, KNN, DT, and 
NB, are chosen to be analysed in this study. The designed ensemble learning model is tested against a unified 
OMR dataset which contains six OMR datasets that are publicly available, namely HOMUS_V2, Rebelol, 
Rebelo2, Fornes, OpenOMR, and PrintedMusicSymbols. These datasets are combined, forming a unified 
dataset. As a result, in this study, six experiments have been carried out with each machine learning technique 
as the meta-classifier of the ensemble model. Figure 2 shows the workflow overview of the conducted 
experiments in this study. 
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Figure 2. Proposed method’s workflow overview 
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2.2. Dataset 

The dataset used in this study is a dataset that contains images of CWN music notation symbols. Both 
printed and handwritten symbols are used. Pacha and Eidenberger [48] managed to build a tool that has made 
a huge contribution to the OMR field. The tool is a python package called the omrdatasettools. This tool can 
retrieve several datasets created by several researchers, namely HOMUS_V2, Rebelol, Rebelo2, Fornes, 
OpenOMR, and PrintedMusicSymbols datasets. Each dataset will be extracted to produce six folders 
representing each of the six datasets. Each of these folders contains various CWMN symbol images. 
HOMUS_V2 and Fornes are datasets containing handwritten musical notation symbols. Meanwhile, Rebelo1, 
Rebelo2, OpenOMR, and PrintedMusicSymbols are datasets containing printed music notation symbols. 
Table 1 shows the overview of each dataset mentioned. 


Table 1. Dataset overview 


Dataset Number of classes _ Type of musical symbols Total images Range of data counts per class 
HOMUS_V2 32 Handwritten 15,200 396-801 
Rebelol 30 Printed 7,940 6-897 
Rebelo2 56 Printed 7,307 1-508 
Fornes 7 Handwritten 4,094 471-820 
OpenOMR 15 Printed 706 4-112 
PrintedMusicSymbols 36 Printed 213 1-63 


These datasets are then combined, forming a unified dataset. In the combining process, the dataset is 
analysed so that there is no overlap in the classes. Data that has the same class, but different class writings have 
been changed to be labeled as one proper class label and merged into the same class folder, resulting in a dataset 
containing 64 classes and 35,460 images of musical symbols. Due to the limited resources used in this study, 
this dataset was down sampled by determining that each class only accommodates a maximum of 300 images 
to ease the training process. This dataset will hereinafter be referred to as the Downsampled300Unified dataset. 
The Downsampled300Unified dataset contains 64 classes and 12,401 images of CWMN symbols (34.97% of 
the whole unified dataset). After the dataset is created, some classes have too little data. There even exists two 
classes that only have one sample. This can cause problems when splitting data into training, validation, and 
test set. Therefore, these classes, which count as many as 14 classes, were removed from the dataset. That way, 
the Downsampled300Unified dataset contains 50 classes and 12,256 CWMN symbol images (34.56% of the 
entire unified dataset) with the smallest number of samples being 73. Figure 3 shows several sample images of 
the Downsampled300Unified dataset. 


Figure 3. Sample images of the Downsampled300Unified dataset 


Figure 4 shows the distribution of the number of images in each class in the Downsampled300Unified 
dataset and Table 2 shows the amount of data in each class. The dataset has an unbalanced amount of image 
data. This is left so to evaluate the model's performance in dealing with unbalanced datasets. Thus, no data 
augmentation process is carried out in this study. 
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Figure 4. Distribution of the number of images in each class in the Downsampled300Unified dataset 


Table 2. The amount of data in each class 


Class Amount of data Class Amount of data 
12-8-time 300 Quarter-rest 300 
2-2-time 300 Sharp 300 
2-4-time 300 Sixteenth-note 300 
3-4-time 300 Sixteenth-rest 300 
3-8-time 300 Sixty-four-note 300 
4-4-time 300 Sixty-four-rest 300 
6-8-time 300 Thirty-two-note 300 
9-8-time 300 Thirty-two-rest 300 
Accent 300 Tie-slur 300 
Barline 300 Whole-half-rest 300 
Beam 300 Whole-note 300 
C-Clef 300 Multiple-quarter-notes 243 
Common-time 300 Chord 118 
Cut-time 300 Staccatissimo 116 
Dot 300 Double-whole-rest 106 
Double-sharp 300 Fermata 102 
Eighth-note 300 Multiple-eighth-notes 96 
Eighth-rest 300 Glissando 95 
F-clef 300 Tenuto 92 
Flat 300 Mordent 86 
G-clef 300 Multiple-sixteenth-notes 85 
Half-note 300 Stopped 85 
Natural 300 Turn 81 
Other 300 Marcato 78 
Quarter-note 300 Tuplet 73 


For the preparation of the experiment, the dataset used will be given a little pre-processing, namely 
changing the size of the image. Each musical notation image in the dataset will be resized to 299x299 pixels. 
This image size was chosen because it is the minimum size of the input image that can be accepted by Inception- 
V3. Other DCNN models are also capable of accepting input images of this size. The input image is not resized 
for each model to give the same treatment to all models so that the stacking ensemble concept can be fully 
applied. No further pre-processing method is required to enhance the images on the dataset. 

The dataset is then split into train, valid, and test data with the portion of 60%, 20%, and 20% 
respectively. The amount of train data is set at 60% to prevent the training process from being too long since 
four DCNNs will go through the classification task. All these split data are then given to each DCNN in the 
base-classifier segment. 


2.3. Ensemble learning 

There are three commonly used ensemble methodologies, namely bagging, stacking, and boosting 
[25]. In this study, stacking ensemble learning has been chosen to be applied to the OMR task. Stacking 
ensemble learning has been determined to be used because, in this type of ensemble learning, each 
base-classifier model will be given the whole data in the dataset, in contrast to bagging ensemble learning 
where each base-classifier model will only get a part or subset of the dataset used. This makes the result 
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obtained in [26] show that stacking ensemble learning can outperform other types of ensemble learnings in the 
problems studied. 

The ensemble learning in this study is built using several DCNN models as the base classifier, namely 
ResNeXt50, Inception-V3, RegNetY-400MF, and EfficientNet-V2-S. Each model will produce its prediction 
for all musical symbol images in the given dataset. The prediction results of these models are very likely to 
have differences. Therefore, in the ensemble learning model, the class prediction results generated by all base 
classifiers will be used as knowledge and input for a meta-classifier. This meta-classifier will provide the final 
result of the classification process for all data in the given dataset. 

The meta-classifier in this learning ensemble can be performed using various machine learning 
techniques. Therefore, further analysis is needed to find the best technique to be applied to OMR ensemble 
learning. Several machine learning techniques that excel in classification tasks have been determined to be 
used as meta-classifiers in ensemble learning, namely SVM, LR, RF, KNN, DT, and NB. Results of the 
ensemble learning classification performance of each meta-classifier in each dataset are then reported to 
determine which of the machine learning techniques produced the best result in the OMR task. 


2.4. Evaluation 

In this study, the data is split into training, validation, and testing data with percentages of 60%, 20%, 
and 20% respectively. The split dataset is then given to the ensemble model. The learning ensemble that is built 
in this study will be given several evaluations. In each classification process carried out by all base-classifier 
models, accuracy calculations will be carried out. This is done to provide a report on the results of the 
classification performance carried out by each base-classifier model on the data provided without any ensemble 
learning. 

After the prediction results by the base-classifiers are given to the meta-classifier, the meta-classifier 
will produce the final prediction results for each data in the dataset. These predictions can be mapped in a 
multi-class confusion matrix that contains the true positives, true negatives, false positives, and false negatives 
values. Using these values, the accuracy and Fl-score evaluation metric can be calculated. Accuracy score can 
be calculated as in (1). Fl-score will be calculated for each class in the dataset as in (2). After Fl-score of each 
class are calculated, then the average value of the fl-score which concludes the value of the f1 score for the 
entire ensemble model can be calculated as in (3). 


_ (TotalTrue positive + TOtalTrue Negative) 

Accuracy = (1) 
ndata 
True Positive 

Fl-score = nee na he ae ne Ge, (2) 

True Positive + z (False Positive + False Negative) 
Total F1—Scores 
AVG F1-score = ——————— (3) 


Nclass 


Since in this study six meta-classifier methods are analysed, therefore 12 evaluation metric scores are 
produced in the experiment. Using these evaluation metric scores, the best meta-classifier can be concluded. 
Meta-classifier that yields the highest evaluation metric scores is considered the best meta-classifier. 

The built ensemble model is also evaluated by comparing the model with a comparative model. This 
is done by comparing the accuracy scores produced by the six meta-classifiers of both ensemble models. 
Therefore, the comparative ensemble model is also given several experiments using the determined six machine 
learning techniques as the meta-classifier of the ensemble model. In this way, the improvements provided by 
the built model ensemble can be clearly reported and assessed. 


3. RESULTS AND DISCUSSION 

This study has experimented with performing the proposed method of ensemble learning on an OMR 
task using the determined Downsampled300Unified dataset. The designed ensemble model consists of four 
DCNNs as the base classifiers, namely ResNeXt50, Inception-V3, RegNetY-400MF, and 
EfficientNet-V2-S, and one machine learning technique as the meta-classifier of the ensemble model. To 
determine the best machine learning technique to be used as the OMR ensemble model’s meta-classifier, 
several machine learning techniques are determined to be performed in the ensemble model, namely SVM, LR, 
RF, KNN, DT, and NB. 

The experiment is done to produce several results. The first result is the performance of the determined 
four base classifier models (the four DCNNs) on the dataset. The second result is the performance results of 
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the ensemble learning model on the dataset using each meta-classifier. The last result is the best meta-classifier 
produced in this research through the performed experiment. 

The performance of each base classifier model is evaluated using the accuracy score. In the 
base-classifier segment of the ensemble model, each DCNN base classifier performed the classification task 
on the given dataset, the Downsampled300Unified dataset. This classification task can be evaluated using the 
accuracy score, thus the performance of each DCNN can be evaluated. Table 3 shows each DCNN’s accuracy 
scores in classifying CWMN symbols in the Downsampled300Unified dataset. The experiment showed that 
EfficientNet-V2-S outperformed the other DCNN in terms of accuracy score. The model produced a 0.9735 
accuracy score. The best accuracy score ranking was followed by Inception-V3 with an accuracy score of 
0.9694, then ResNeXt50 with an accuracy score of 0.9670, and RegNetY-400MF with an accuracy score of 
0.9625 respectively. 


Table 3. Base classifiers accuracy scores on the Downsampled300Unified dataset 


Model Accuracy score 
ResNeXt50 0.9670 
Inception-V3 0.9694 
RegNetY-400MF 0.9625 
EfficientNet-V2-S 0.9735 


After the four DCNNs’ performances are evaluated, the ensemble model built will then be evaluated. 
This is done by performing six experiments since in this research, six machine learning techniques are 
determined to be used as the meta-classifier of the proposed ensemble model. The classification predictions of 
the meta-classifiers are evaluated using two evaluation metrics, namely the accuracy score and the Fl-score. 
Table 4 shows the accuracy and Fl-scores of each meta-classifier in performing the classification task using 
the Downsampled300Unified dataset. 


Table 4. Proposed ensemble model’s evaluation metric scores using each meta-classifier 


Meta-classifier Accuracy F1 score 
SVM 0.9747 0.9748 
LR 0.9751 0.9752 
RF 0.9731 0.9732 
KNN 0.9751 0.9751 
DT 0.9580 0.9600 
NB 0.9662 0.9676 


Through the carried-out experiment, it has been obtained that the proposed ensemble learning model 
has produced outstanding evaluation metric scores. Almost all the evaluation metric scores achieved 0.97 
scores with the best value of 0.9751 in terms of accuracy score and 0.9752 in terms of F1-score, both of which 
were achieved by the LR meta-classifier and also KNN in the accuracy score. This indicates that the proposed 
ensemble learning model succeeded in achieving a good performance in completing the OMR task. Using this 
evaluation metric results, the best meta-classifier can be determined. The conducted experiment has shown that 
LR produced the highest accuracy and Fl-score with slight differences from the other 
meta-classifiers accuracy and Fl-scores. Thus, LR succeeded in achieving the best meta-classifier perfectly. 

The proposed ensemble learning model is then compared with the ensemble learning that has been 
proposed by [24]. Therefore, an ensemble model that uses ResNet50, DenseNet161, and GoogLeNet as the 
base classifiers is also built. The model is also performed in six experiments using each determined machine 
learning technique determined in this study as the meta-classifier. The experiment is also done using the same 
dataset, therefore there is no difference in the data volume between the dataset used using both ensemble 
models. This model is used as the comparative model. The comparison process is done by comparing both 
models’ accuracy scores using each meta-classifier. Table 5 shows the comparison result between the proposed 
model and the comparative study. 

Through this comparison process, it can be seen proved that the proposed model outperformed the 
comparative model in every meta classifier’s ensemble accuracy. The biggest accuracy score margin occurred 
in the DT meta-classifier, with a margin of 0.0163. the second and the third biggest margin occurred in the RF 
and NB meta-classifier with the margin of 0.0082 and 0.0077 respectively. The fourth and fifth biggest margin 
occurred in the LR and KNN meta-classifier with the margin of 0.0053 and 0.0049 respectively. The smallest 
margin occurred in the SVM meta-classifier with the margin of 0.0029. 
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Table 5. Comparison between the proposed model and the comparative study 
Meta-classifier’s accuracy score 
SVM LR RF KNN DT NB 
Proposed ResNeXt50 0.9747 0.9751 0.9731 0.9751 0.9580 0.9662 

Inception-V3 
RegNetY-400MF 
EfficientNet-V2-S 

Comparative model [24] ResNet50 0.9719 0.9698 0.9649 0.9702 0.9417 0.9584 
DenseNet161 
GoogLeNet 


Ensemble model Base-classifiers 


4. CONCLUSION 

This study aims to build model ensembles with newer and more robust base classifier models and to 
analyze several machine learning techniques to be used as the ensemble learning model’s meta-classifier for 
the OMR task. The study only focused on boosting the performance result produced by the model without 
considering other assessment variables, such as time and resources required. A stacking ensemble learning 
model has been designed to use four DCNNs, namely ResNeXt50, Inception-V3, RegNetY-400MF, and 
EfficientNet-V2-S, followed by a machine learning meta-classifier. Six machine learning techniques are 
determined to be analysed, namely SVM, LR, RF, KNN, DT, and NB. The model is tested against the 
Downsampled300Unified dataset. 

The proposed model has succeeded in increasing the ensemble learning performance score compared 
to the previous study. It is proven that several models that are proven good to be used on the classification task 
can improve the performance of an OMR ensemble learning model, such as ResNeXt50, Inception-V3, 
RegNetY-400MF, and EfficientNet-V2-S. As a result, the proposed ensemble learning model succeeded in 
achieving outstanding evaluation metric scores in completing the OMR task. 

Through the conducted experiment performed using only the base-classifier models, it is shown that 
EfficientNet-V2-S outperformed the other models with a slight difference in the accuracy score. The model 
succeeded in obtaining an accuracy score of 0.9735. The experiment on the ensemble model has also shown 
an outstanding result. Almost all the evaluation metric scores achieved 0.97 with the best values of 0.9751 and 
0.9752 in terms of accuracy and Fl-score, respectively; both of which were achieved by the LR 
meta-classifier. 

This study has succeeded in building an ensemble learning for the OMR task and has produced very 
good results. The study has also analyzed machine learning techniques that are good for use as a 
meta-classifier of OMR ensemble learning. This study opens business opportunities to create a music notation 
recognition software or application that is capable of accurately recognizing musical symbols. So that 
musicians will be supported in doing their work. Academically, further OMR research that is built using 
ensemble learning can use this study as a reference. In addition, because the OMR task is similar to the OCR 
task, the ensemble model proposed in this study can also be built for OCR research. 

Although the proposed model has produced good results, there are still some things that can be 
improved in further research. In this study, the four DCNNs used are the lowest architectural structures of their 
kind. This is done to ease the process of running experiments on the ensemble model. In the next study, this 
problem can be experimented with the same DCNN model but using a more complex architectural arrangement, 
such as using the large or medium version of EfficientNetV2. In addition, other DCNN models that are proven 
to be faster and do not reduce model performance can also be used. This will make the research not only 
focused on boosting the evaluation metric results but can also consider the time costs incurred by the model. 
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